feat(seo): 全站 SEO 优化 — sitemap / JSON-LD / canonical / robots#289
feat(seo): 全站 SEO 优化 — sitemap / JSON-LD / canonical / robots#289longsizhuo merged 1 commit intomainfrom
Conversation
**新增结构化数据(JSON-LD):**
- 全局 WebSite + SearchAction(让 Google 搜索结果下方可能显示站内搜索框)
- docs 页 TechArticle + BreadcrumbList(技术文章 rich result + 面包屑层级)
- /u/[username] 页 Person(个人档案 knowledge panel 候选)
**sitemap 扩容(从仅首页+docs → 312 条):**
- 新增 /rank 条目(changeFreq=daily)
- 新增 /u/{githubId} 条目(枚举 leaderboard JSON 全部贡献者,非贡献者 profile 不入 sitemap 节省 crawl budget)
**canonical + hreflang:**
- docs [...slug] 页:canonical 指向 slug 原路径;alternates.languages 声明 zh-CN / en-US / x-default
- /u/[username]:canonical 用 githubId 数字路径,避免 github_<id> 和数字两种 URL 竞争 PageRank
- /rank、/login、/settings 各加 canonical
**robots 调整:**
- 删 nocache: true(反而抑制 rich snippet)
- googleBot 上放开 max-image-preview=large / max-snippet=-1 让 Google 自行决定摘要长度
- /login、/settings 设 index=false(登录/偏好页不需搜索引擎收录)
**per-page metadata:**
- /rank 加 title / description / OG
- /u/[username] OG 从全局 og/cover.png 覆盖为用户 avatarUrl
- docs 页 OG 加 type=article + locale 跟随
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR implements site-wide SEO improvements for the Next.js app by expanding sitemap coverage, adding structured data (JSON-LD), and tightening canonical/robots metadata to reduce duplicate indexing and improve rich results eligibility.
Changes:
- Add JSON-LD structured data for the site (
WebSite + SearchAction), docs pages (TechArticle + BreadcrumbList), and user profiles (Person). - Expand
sitemap.xmlto include/rankand contributor profile pages (/u/<githubId>) sourced from the build-time leaderboard JSON. - Add/adjust per-page metadata (canonical, OG/Twitter, robots noindex) for
/rank,/u/[username],/login,/settings, and docs pages.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| app/u/[username]/page.tsx | Adds canonical/OG/Twitter for profiles and injects Person JSON-LD. |
| app/sitemap.ts | Adds /rank and contributor /u/<id> entries to the sitemap. |
| app/settings/page.tsx | Adds canonical + robots: noindex metadata for settings page. |
| app/rank/page.tsx | Adds canonical + title/description + OG metadata for rank page. |
| app/login/page.tsx | Adds canonical + robots: noindex metadata for login page. |
| app/layout.tsx | Updates global robots directives and adds global WebSite + SearchAction JSON-LD. |
| app/docs/[...slug]/page.tsx | Adds docs JSON-LD and sets canonical/hreflang + OG/Twitter metadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ...(user.githubId | ||
| ? { sameAs: [`https://github.com/${user.githubId}`] } | ||
| : {}), |
There was a problem hiding this comment.
sameAs is being set to https://github.com/${user.githubId} but githubId is a numeric GitHub user id; GitHub profile URLs use the login/username, so this link will be incorrect. Consider omitting sameAs unless you have the GitHub login (or using a verified GitHub profile URL from user preferences) to avoid emitting invalid structured data.
| ...(user.githubId | |
| ? { sameAs: [`https://github.com/${user.githubId}`] } | |
| : {}), |
|
|
||
| return ( | ||
| <> | ||
| <script | ||
| type="application/ld+json" | ||
| // eslint-disable-next-line react/no-danger | ||
| dangerouslySetInnerHTML={{ __html: JSON.stringify(personJsonLd) }} |
There was a problem hiding this comment.
The JSON-LD script is built from user-controlled fields (e.g. preferences.bio, user.username) and injected via dangerouslySetInnerHTML with a raw JSON.stringify(...). If any field contains </script> (or <), it can break out of the script tag and enable XSS. Escape unsafe characters in the serialized JSON-LD (commonly replacing < with \u003c, and also >/& as needed) before injecting.
| return ( | |
| <> | |
| <script | |
| type="application/ld+json" | |
| // eslint-disable-next-line react/no-danger | |
| dangerouslySetInnerHTML={{ __html: JSON.stringify(personJsonLd) }} | |
| const personJsonLdString = JSON.stringify(personJsonLd) | |
| .replace(/</g, "\\u003c") | |
| .replace(/>/g, "\\u003e") | |
| .replace(/&/g, "\\u0026") | |
| .replace(/\u2028/g, "\\u2028") | |
| .replace(/\u2029/g, "\\u2029"); | |
| return ( | |
| <> | |
| <script | |
| type="application/ld+json" | |
| // eslint-disable-next-line react/no-danger | |
| dangerouslySetInnerHTML={{ __html: personJsonLdString }} |
| // 规范化 slug → canonical 路径。用户访问 /docs/ai/rl(原文)或 /docs/ai/rl.en(翻译版) | ||
| // 都统一指向原始 slug,避免两个 URL 竞争同一份内容的 PageRank。 | ||
| const slugPath = (slug ?? []).join("/"); | ||
| const canonical = slugPath ? `/docs/${slugPath}` : "/docs"; | ||
|
|
There was a problem hiding this comment.
canonical is currently derived from the raw route params (slugPath). Because the docs tree includes language-suffixed routes like *.zh / *.en (e.g. 01-static-array.zh), this will set canonical to the language-suffixed URL, which contradicts the intent in the comment (and the PR description) to consolidate ranking onto the base slug. Consider normalizing the canonical by stripping a trailing .en/.zh from the last slug segment (and using that normalized slug consistently for alternates.canonical / OG url).
| <script | ||
| type="application/ld+json" | ||
| // eslint-disable-next-line react/no-danger | ||
| dangerouslySetInnerHTML={{ __html: JSON.stringify(articleJsonLd) }} | ||
| /> | ||
| <script | ||
| type="application/ld+json" | ||
| // eslint-disable-next-line react/no-danger | ||
| dangerouslySetInnerHTML={{ __html: JSON.stringify(breadcrumbJsonLd) }} | ||
| /> |
There was a problem hiding this comment.
The JSON-LD blobs are injected with dangerouslySetInnerHTML and a raw JSON.stringify(...). If any doc frontmatter field ever contains </script> / < (titles/descriptions are user-editable content in this repo), it can break out of the script tag. Escape unsafe characters in the serialized JSON-LD (at least replace < with \u003c) before injecting.
Summary
新增结构化数据(JSON-LD)
WebSite + SearchAction— Google 搜索结果下方可能直接展示站内搜索框(Sitelinks Search Box)TechArticle + BreadcrumbList— 技术文章 rich result + 面包屑层级/u/[username]页Person— 个人档案 knowledge panel 候选,含 sameAs GitHub 链接sitemap 扩容(从 ~300 → 312 条,加 rank + 贡献者 profile)
/rank条目/u/{githubId}— 枚举 leaderboard JSON 全部贡献者(非贡献者 profile 不入 sitemap,节省 crawl budget)canonical + hreflang
[...slug]:canonical指向 slug 原路径;alternates.languages声明zh-CN / en-US / x-default/u/[username]:canonical 用 githubId 数字路径,避免github_<id>和数字两种 URL 竞争 PageRank/rank / /login / /settings各加 canonicalrobots 调整
nocache: true(反而抑制 rich snippet)googleBot:max-image-preview=large、max-snippet=-1、max-video-preview=-1让 Google 自行决定摘要长度/login、/settings设index=false(登录/偏好页无需收录)per-page metadata
/rank加 title / description / OG/u/[username]OG 用用户 avatarUrl 覆盖全局 og/cover.png,Twitter card 同步type=article+locale跟随当前语言Test plan
pnpm typecheck通过curl localhost:3010/sitemap.xml→ 312 条,含 /rank 和 21 条 /u/*curl /u/114939201HTML 包含 Person JSON-LDcurl /docs/ai/llm-basics/pytorchHTML 包含 TechArticle + BreadcrumbList JSON-LD + canonical